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1.  Introduction. 

A 

Let  f (  • ! h )  be  a  nonparametric,  kernel  estimator  of  an  unknown  density  f, 

with  bandwidth  (window  size)  h.  A  considerable  amount  has  been  written  about 

2 

"optimal"  selection  of  bandwidth,  usually  in  the  context  of  minimising  L 
error  (see  e.g.  Fryer  [6],  Wegman  [23]).  In  particular,  there  are  well- 
known  asymptotic  formulae  for  the  window  h^  which  minimises  mean  integrated 
square  error  for  a  given  f  (see  Parzen  [14],  Rosenblatt  [17]).  Of  course, 
h^  depends  intimately  on  the  unknown  density,  and  so  is  not  a  practical  choice. 
Furthermore,  a  statistician  who  has  been  given  a  sample  to  analyse  should 
really  be  interested  in  minimising  integrated  square  error  for  that  part¬ 
icular  sample,  not  in  minimising  the  average  error  over  all  possible  samples. 

A 

Unfortunately  the  window  h^.  which  minimises  integrated  square  error  is  also 
an  intricate  function  of  the  unknown  f. 
y  Any  practical  method  of  constructing  a  bandwidth  must  depend  only  on 

-w  rA'f  dAr  *i 


/*.'  ^sample,  and  should  produce  some  sort  of  ^estimate^of 


^  The  purj 


purpose  of  this 


paper  is  to  show  that  there  are  well-defined  limits  to  the  accuracy  of  all 
data-driven  bandwidth  estimates.  Put  another  way,  there  is  an  unbridgeable 
gap  between  the  minimum  integrated  square  error  attained  using  t-he  optimal 
bandwidth  h^.,  and  the  minimum  achievable  integrated  square  error  using  a  data- 
driven  bandwidth  estimate.  ’  .  JT7'^  uJa'.-  fP  ’ 

We  pause  now  to  introduce  notation.  Let  Xj,  . . . ,  Xn  be  a  random  sample 
from  an  unknown  density  f,  and  let  K  be  a  kernel  function.  Here  and  during 
most  of  this  paper  we  work  in  one  dimension,  although  extensions  to  higher 
dimensions  will  be  indicated  at  the  end  of  Section  2.  We  assume  at  least 


•  ■  -  »  •  A  * 


’  .  *  Jk  •  "  »  *  \ 

*  .  *  »■»  A*  ** 


V.  -- 

■V«i 


r,  . 
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that  K  is  continuous  with  compact  support,  and  is  constructed  to  suit  a 
density  f  with  t(>_  2)  bounded  derivatives: 

1  if  i =  0 

(1.1)  /  z1K(z)dz  =0  if  1  <  i  <  t-1 

dj,  if  i  *  t  , 

where  d„^0.  (The  most  common  case,  where  K  is  symmetric  and  positive,  has 
1\ 

t =  2  in  these  specifications,  see  Parzen  [14]  and  Bartlett  [1]  for  discussions 
of  the  general  case..  Our  density  estimate  is 

1  n 

f(x|h)  =  (nh)  l  K{(x-X.)/h},  -°°<x<°°, 

i=l 

and  has  integrated  square  error 

A(h,f)  5  /{f(x |h)  -  f(x)}2dx  . 

Mean  integrated  square  error  is  given  by 

M (h , f )  =  E{A(h,f)  }  . 

Define  also: 

D (h , f )  =  A(h,f)  -  M(h,f)  . 

Assume  f  has  at  least  t  continuous  deviratives,  and  suppose  for  the 
sake  of  argument  that  f  vanishes  outside  a  compact  interval.  (This  property 
permits  us  to  avoid  cumbersome  regularity  conditions,  but  is  not  essential.) 
Then  the  "optimal  fixed  bandwidth"  h^.  minimises  M(h,f)  and  is  asymptotic  to 
a  constant  multiple  of  n“l/(2t+l)^  and  the  "optimal  bandwidth"  h^  minimises 

A 

A(h,f)  and  satisfies  h^/h^.  ■>  1  in  probability  as  n  ■+■  °°  .  Any  practical 

A 

procedure  for  constructing  a  bandwidth  produces  a  random  variable  h  which  is 


a  function  solely  of  the  sample;  it  clearly  must  not  depend  on  the  unknown  f. 

A. 

A  statistician  who  claims  that  a  certain  procedure  h  is  "best  possible",  is 
really  saying:  "In  some  sense,  the  closest  you  can  come  to  minimizing  A(h,f) , 

A  A  A 

isA(h,f)".  Of  course,  A(h,f)  exceeds  the  true  minimum  A(hpf),  but  we 

A 

cannot  realistically  expect  to  close  that  gap.  It  is  known  [9]  that  if  hc 

A  A 

is  the  cross-validatory  window,  then  n{A(hc,f)  -  A(h^.,f)}  has  an  asymptotic 
chi-square  distribution  with  one  degree  of  freedom.  Therefore  the  distance 

A  A  « 

between  A(h,f)  and  A(h^,f)  can  be  reduced  to  at  least  order  n  .  In  Section 
2  we  shall  show  that  in  a  minimax  sense,  order  n~*  is  a  lower  bound  as  well 
as  an  upper  bound.  From  this  point  of  view,  least-squares  cross-validation 
is  second-order  optimal;  it  is  already  known  to  be  first-order  optimal 
[7,8,21,4]. 

Throughout  this  discussion  we  have  assessed  optimality  on  the  A-scale, 
not  the  h-scale.  However,  the  two  are  interchangeable.  To  see  this,  observe 

.  A 

that  if  the  kernel  K  has  two  continuous  derivatives  then  we  may  expand  A(h,f) 

A 

in  a  Taylor  series  about  h^,  obtaining: 

A(h,f)  =  A(hf,f)  ♦  (h-hf)  AU)(hf,f)  +  1  Ch-hf)2  A(2)(h\f), 

A  A 

where  h*  lies  inbetween  h  and  h^,  and 
A(l)(h,f)  =  (6/ 6h) 1  A(h,f)  . 


Since  h^.  minimises  A(*,f)  , 

A(h,f)  -  A(hf ,f)  =  1  (h-hf ) 2  A(2)(h*,f)  . 

A 

Suppose  the  data-driven  bandwidth  h  has  at  least  a  chance  of  being  "good", 

A  A 

so  that  h/h^  +  1  in  probability.  Then  it  may  be  shown,  under  the  conditions 
stated  earlier  about  f,  that  as  n  ®, 


n2(t-l)/(2t+l)  +  c(f,K)  >  O' 


in  probability.  In  fact, 

c(f,K)  =  lim  n2(t"1)/C2t+1)M(h.,f) 


Therefore 

A(h,f)  -  A(hf,f)  =  l  c(f)n'2(t_1)/C2t+1)(h-hf)2{l  +  op(l)>. 

A  A 

It  follows  from  this  expansion  that  whenever  A(h,f)  -  A(h^,f)  is  of 

order  n"1,  we  also  have  h-hf  of  order  n~3/2(2t+1)  .  Furthermore 

the  fact  that  n”1  cannot  be  improved  upon  is  equivalent  to  the  statement 
*  ~  -3/2(2t+l) 

that  h  and  h^.  must  be  at  least  n  '  ^  *  apart  in  some  sense.  Therefore 

A  ~  ^  -3/2(2t+l) 

a  procedure  h  for  which  |h-h^|  ~n  ,  is  "best  possible". 

It  is  instructive  to  specialise  these  formulae  to  the  important  case 

t=2,  where  the  kernel  is  usually  taken  to  be  positive.  There,  the  bandwidths 

~  ~  -1/5 

h  and  h^  are  both  asymptotic  to  a  constant  multiple  of  n  ,  and  (we  are 

claiming)  their  distance  apart  is  at  least  n-3^1^,  in  a  minimax  sense.  There 

A  A 

fore  the  fastest  rate  of  convergence  of  h  to  h^.  is  excruciating  slow: 

~  ~  -1/10 

Ch/h^)  —  1  can  be  no  smaller  than  order  n  '  ,  in  a  minimax  sense. 

For  most  of  this  paper  we  discuss  our  results  on  the  h-scale,  not  the 
A-scale,  since  we  feel  statisticians  arc  more  familiar  with  bandwidth  than 
they  are  with  integrated  square  error.  The  statistician  must  make  an 
explicit  choice  of  bandwidth,  but  only  chooses  integrated  square  error 
indirectly.  Our  main  results  will  be  formulated  in  Section  2,  and  proved 
in  the  ensuing  two  sections.  Section  3  will  give  introductory  lemmas,  while 
Section  4  will  present  main  proofs. 


It  is  worth  pointing  out  that  our  results  (as  well  as  their  proofs) 
are  quite  different  in  character  from  traditional  works  on  "optimal  rates 
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of  convergence"  for  nonparametric  density  estimators  [5,10,12,13,18,19,22,2]. 
The  classical  argument  involves  showing  that  a  certain  kernel  estimator 
(for  example)  is  asymptotically  optimal  in  the  class  of  all  possible  density 
estimators;  that  class  includes  orthogonal  series  estimators,  spline 
estimators,  etc.  But  in  our  case  we  confine  attention  not  only  to  kernel 
estimators,  but  to  kernel  estimators  constructed  using  a  specific,  fixed 
kernel  K.  The  only  variable  is  the  bandwidth  in  that  special  estimator. 

We  are,  in  effect,  switching  attention  from  the  problem  of  "best  estimates" 

A  A 

of  a  density,  to  that  of  "best  estimates"  of  the  bandwidth  h^..  But  h^.  is 
a  random  varible,  and  our  problem  of  "estimating  a  random  variable"  is 
quite  different  from  that  of  estimating  a  density  function.  Works  of  Rice 
[16]  is  perhaps  closest  in  spirit  to  this  paper  and  [9],  although  Rice  did 
not  view  the  minimiser  of  integrated  square  error  as  the  benchmark  bandwidth. 
Rice's  work  is  for  the  case  of  nonparametric  regression,  and  a  sequel  to 
our  paper  will  describe  analogues  of  our  results  in  that  context. 


2.  Main  results. 


Minimax  theory  is  usually  developed  by  assessing  performance  over  a 
specific  "test  class"  0  of  distributions.  It  is  clear  that  if  O'  is  any 
class  containing  0,  then  the  worst  performance  over  0*  is  at  least  as  bad 
as  the  worst  performance  over  0.  Therefore  a  basic  result  about  distributions 
in  0  may  be  generalized  in  many  ways. 

To  define  0,  we  begin  with  any  compactly  supported  density  f^  having 
t+2  derivatives  on  (-00  ,  00)  and  (for  convenience)  satisfying  fQ(x)  -  c^>o 

for  xe  [0,1].  Define  c^  =  sup  ||f  ^  (x)  |  .  Let  \p  be  any  function  on 

x;j<t+2  U 

[0,1]  which  has  t+2  derivatives  and  satisfies  sup  (x)  |  <  $c^ , 

0<x<l 

|Y(j)U)  |  >  0  and  /^(O)  =  ij/^G)  =  0  for  0  <  j  <  t+2.  Set  ip(x)  =  -y(l-x) 
for  xe  [J,l],  and  extend  ip  from  [0,1]  to  (-00,00)  by  periodicity.  Let  m 
equal  the  integer  part  of  nly^2t+1\  and  define 


y(x)  *  y(x,n)  = 


m  iKmx)  for  0  <  x  <  1 


otherwise. 


.For  v  =  0, . . . ,m-l, 

let  Yy(x)  =  yOO  on  Cy  =  [vm_1,  (v+l)m  1] ,  and  Yy(x)  =  0  off  Cy. 

Let  {t^,  0  <_  v  £  m-l}  be  any  sequence  of  length  m  all  of  whose  elements 
are  zeros  and  ones,  and  let 


0(x)  =  9(t0>  •  •  •  »Tjn_1)  (x)  =  fgCx)  {1  +  Z  Ty  Yy(x)  ) *  <  x  <  °°. 

v 

The  set  0  =  0  (n)  is  defined  to  be  the  class  of  all  such  functions  0. 

The  elements  of  0  are  all  probability  densities  with  support  equal  to  the 
support  of  fg  and  satisfying 

sup  I e  C:i ^  (x)  |  <  c[1). 
x;j<t 


*-  \ 


*  -  ‘  .S.  *.*•••  « 


■k  +  V  •  .*■•**  *  ' .  ■  %  N 


In  particular,  the  t'th  derivatives  of  densities  in  0  are  all  uniformly 
bounded.  The  kernel  K  specified  by  (1.1)  is  designed  for  just  this  type 


of  density. 

We  are  now  in  a  position  to  state  our  main  theorem.  Let  K  be  any 
compactly  supported  kernel  satisfying  (1.1),  and  having  two  Holder- 
continuous  derivatives  on  (-00,00).  Let  h  be  a  data-driven  bandwidth  estimate. 
Any  positive  function  of  the  sample  Xj,...,X  is  a  candidate  for  h.  Recall 

A 

that  h.  is  the  bandwidth  which  minimizes  A(h,8). 

C 

THEOREM  2.1.  Under  the  above  conditions  on  K, 


(2.1)  1 im  1 im  inf  sup  P„ ( |h-h. [  > er,  3/2(2t  +  1))  _ 

£->0  n  9f  0 


In  this  sense,  no  data-driven  bandwidth  can  get  closer  than  order 
n-3/2(2t+l)  tQ  £ 

U 

A 

Next  we  introduce  the  cross-val idatory  bandwidth.  Let  f^  denote  the 
kernel  estimate  obtained  by  leaving  out  the  i'th  sample  value: 


f.(x|h)  =  {(n-l)h)  1  I  K{(x-X.)/h}. 

1  j^i  J 

Define 

/V  ■*  a 

6(h,0)  H  2  /  f (x |h)Q(x)dx  -  2  n  l  f . (x. |h)  , 

i=l  1  1 

(2.2)  CV (h)  =  A(h,0)  +  <5(h,0)  -  /82 

=  /  f2(x  |h)dx  -  2n  ^  'l  f .  (X.  |h)  . 

i=l  1  1 

A 

The  cross-val idatory  window,  hc>  is  that  value  of  h  which  minimizes  CV(h). 
Our  next  theorem  is  the  natural  complement  of  Theorem  2.1,  in  the  case 

A  A 

where  h  =  h  . 

c 
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THEOREM  2.2  Under  the  same  conditions  on  K, 

(2.3)  lim  limsup  sup  Pfl(|h  -h  |>A  n'3/2(2t+l) j  _  0. 

X-*»  n  00  9£0  ° 


Thus,  h£  is  as  close  to  as  it  is  possible  to  get,  in  a  minimax  sense 

Result  (2.3)  fails  to  hold  if  0  is  replaced  by  the  class  of  densities  f 
with  t  uniformly  bounded  derivatives,  or  even  by  the  class  C(Bq,...,B  )  of 
all  compactly-supported  densitives  satisfying 

sup  |f  ^  (x)  I  <  B  ,  o  <_  i  <  t, 

-°°<x  <°° 


for  given  constants  B^,  ...,  B^.  To  see  this,  suppose  Z  has  density  fe  C(B) 

and  let  Z  =pZ  for  p  >  1.  The  density  f  of  Z  is  in  C(B),  and  since 

P  -  3  P  p  - 

A 

scalar  expansion  of  the  data  leads  to  an  identical  expansion  of  both  hc  and 
h^,  we  have: 


Pf  (|hc-hf  |  >  Xn'3/2(2t+1))  =  Pf(|hc-hf|  >p_1 

P  P 


*n-3/2(2t+1>)  . 


Consequently, 


sup  Pf(|h  -hf|  >Xn-3/2(2t+1))  =  1 
feC(B) 


for  each  A  >0  and  each  n  1.  (A  similar  property  may  be  observed  if  we 
work  on  the  A- scale  instead  of  the  h-scale.) 

There  are  several  ways  of  re-defining  C(B)  so  as  to  avoid  this  type  of 
behaviour.  For  example,  we  might  insist  that  densities  in  C(B)  be  above  a 
certain  level  over  an  interval  of  predetermined  length.  However,  we  prefer 
to  avoid  the  obscuring  technicalities  involved  in  this  specif icaiton  by 
using  the  same  test  class  0  to  measure  both  upper  and  lower  bounds  to 
performance . 


Theorems  2.1  and  2.2  have  analogues  on  the  A-scale.  We  state  them 

/v 

together  here,  without  proofs.  Once  again,  h  denotes  an  arbitrary  data- 
driven  window  . 

THEOREM  2.3.  Under  the  same  conditions  on  K, 

A  A  i 

lim  liminf  sup  P  {A(h , 0)  -  A(h  , 0)  > en-1 }  =  1, 
e-K)  n  -+  00  0e0  u 

A  A  < 

lim  lim  sup  sup  Pfl{A(h  ,6)  -  A(n  , 8)  > Xn" *}  =  0. 

X-x»  n  -*■  oo  0e0 

To  obtain  analogues  of  these  results  for  p-dimensional  density  estimators 
modify  the  class  Q  along  the  lines  of  Stone  [20].  Theorem  2.3  continues  to 
hold  without  change. 


3.  Preparatory  lemmas. 

In  this  and  the  next  section,  the  symbols  C,  C^,  C2, .  • •  denote  generic 
positive  constants.  E  denotes  the  complement  of  an  event  E.  Superscript 
notation  in  A^,  6^,  and  indicates  differentiation  with  respect 

to  bandwidth,  h.  We  keep  our  proofs  very  brief,  leaving  out  all  arguments 
whose  development  closely  parallels  work  in  [9] .  There  are  no  essential 
differences  between  arguments  for  different  values  of  t,  and  so  we  work 
only  with  t =2,  to  simplify  notation. 

Several  useful,  intuitively  obvious  technical  properties  of  densities  from 
0  are  summarised  in  our  first  lemma.  The  proofs  are  tedious  but  straight¬ 
forward,  and  so  we  give  only  an  outline. 


LEMMA  3.1  Take  t=2  in  all  that  follows.  Then:  for  some  nQ  >  0, 


(3.1)  0<  inf  n1/5h  < 

n>n0,Q(3 


1/S  K  . 
n  h.  < 


n>nQ,Qe3  n>nQ,OeO 

for  any  e  >  0  there  exists  n  =  n(e)  >  0  such  that 


(3.2)  inf  M(h,0)  >  (l+n)M(h  ,0) 

|h-h0|  >cn-1/5  -  0 

for  all  0e  0  and  all  large  n;  for  some  n^  >  0, 


(3.3)  0  <  inf  n2^5  M^(hg,0J<^  sup  n2/^5  M^2"*  (hQ,0)  < 00  ; 

n>no,ee0  n>ng,0eO 


for  any  z  >  0  there  exists  n  =  n(e)  >  0  such  that 


(3'4)  kT  -1/5 

h-hQ  < in 


|M(2)(h,9)  -  M(2)(h0,0)|  <  n(e)n'2/S 


for  all  9  c  0  and  all  large  n,  and  n(e)-*0  as  e  -►  0  . 


"  »  *  »  *  *  '  N  •  m  *  >  ' 

*  -  ‘  '  -L' 


■ -  ■  *  v,  «L  m  ».  •-** 


OUTLINE  OF  PROOF:  Write 


M(h, 0)  =  V(h,  0)  +  B(h,0), 

where 

V (h , 0 )  =  n" 1h" 1  ff  K(u) 2  0(x-hu)dudx  -  n"1  /[/K(u)0(x-hu)du]2dx, 

B(h, 0)  =  /[/K(u){0(x-hu)  -  0(x)}du]2dx. 

The  derivatives  M^(h,0)  and  M^2-*(h,0)  may  be  studied  by  differentiating 
V (h , 0 ) ,  then  approximating  as  in  Rosenblatt  [17],  and  by  differentiating 
B(h,0)  and  using  a  Taylor  expansion  with  integral  form  of  the  remainder.  LI 


Proofs  of  Lemmas  3.2  and  3.3  below  closely  parallel  those  of  Lemmas  3.1 
and  3.2  in  [9].  In  establishing  (3.10),  note  (3.1). 


LEMMA  3.2.  For  each  0  <  a  <  b  <  °°  and  all  positive  integers  £, 
(3.5) 


sup  E0|n7/1OD(1)(n~1/5t,0)  |2£  <  C  (a,b,£) 
n,')'Q,a<t<b  1 


sup  E fl|n7/106(1)(n'1/5t,e) \21  <  C1(a,b,£) 

n,0r0,a<t<b  1 


Furthermore,  there  exists  e  >  0,  not  depending  on  a,b  or  £,  such  that 


(3.7)  E0|n7/l°{Da)  (n'1/5s,0)  -  D(1)  (n'1/5t ,  8)  }  1 2l<  C2  (a,b,  £)  |  s-t  | £  , 


7/10r,(D  r„-l/5_  n)  _  6 r  1 )  - 1/ 5 

for  all  0c  0  and  a  <  s  <  t  <  b  . 


(3.8)  -  «'‘'(n-‘'3t,0))|2t  <  C  (a,b,  H)  |  s-t  | E  1* 


LEMMA  3.3.  For  some  t  >  0  and  any  0  <  a  <  b  <  oo  , 


(3.9)  sup  P0[  sup  {|D(1)(n'1/5t,0)|  +  | 6 (1) (n‘ 1/5t , 0) I )  > n"3/5"e]  +  0. 
0^0  a<t<b 


Furthermore,  for  any  >  0  and  n  >  0  , 
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(3.10)  sup  p.[  ...sup  n7/l0(lD(1)(n-1/5t,e)  -  D(1)(h0,e) 


Q,Q  t-n  ha  <n 


+  |6(1)(n'1/St,e)  -  6(1)(h  6)|}>n]  -  0  . 


LEMMA  3.4.  For  any  c  >  0  , 

sup  PgC |hg-hg |  >  en  '  )  ■*  0  . 

0r  0 

PROOF.  It  suffices  to  show  that  for  any  sequence  of  choices  ®i  =  6^ne  ©»  an<* 
for  each  z  >  0, 

(3.11)  Pe  (|h  -h  |>en"1/5)  -*■(>. 

1  01  91 

b  ^ 

We  may  easily  prove  that  for  some  b  >  0,  P^  (n  £  h^  £  n  )  -►  1.  Let  H  = 

be  a  set  of  bandwidths  in  the  range  [n  b,nb],  and  such  that  # (H)  £  na  for 
some  a  >  0.  Arguing  as  in  the  proofs  of  Lemmas  2  and  4  of  Stone  [21]  we  may 
show  that  for  each  e>0, 

(3.12)  Pfl  (sup  |A(h,0  )  -  M(h,9..)  [/M(h,0..)  >  e)  -*■  0  . 

1  IuH 

Now  use  Holder  continuity  of  K  to  show  that  for  any  (random)  bandwidth  h  with 

P„  (n"b  <  h  <  nb)  1, 

91  -  ~ 

P 0  { |A(h,0x)  -  M(h,01)j/M(h,0j)  >  e)  -►  0  . 

Finally  invoke  (3.2),  to  obtain  (3.11).  ij 

A 

Recall  that  h  is  the  cross-validatory  window,  chosen  to  minimise  the 
c 

function  CV(h)  at  (2.2). 

LEMMA  3.5.  For  any  c  >  0  , 

sup  P0(|hc-h0i  >  en'1/5)  -*•  0  . 


•  ■  •  •  •  *■*  •“*  <d 

'  ♦  "  •  '  *  *  *  *  *  *  »  *  ■*«*»''  .  "  •’»*»■  •’»*•*.*  ■” 

-  ‘  »  •/  .  *  *  %*  .  ’  ,  *  .  '  .  *  .  « 


■V/Ay-;. 


vv 


v  J 
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PROOF.  Again,  it  suffices  to  prove  that  for  any  e > 0  and  sequence  0^  =  0jn c  0, 
(3.13)  P0i  C  !^c-he  I  >en'1/S)  ->0  , 

-b  ^  b 

and  it  is  easily  shown  that  for  some  b  >  0,  P«  (n  <  h  <  n  )->-l.  Define 

61  ~  c  - 

2  - 1  1  n 

CV(h,e)  = cv(h)  +  /6  +  2n  (n-1)  (n+1)  J  {G(X.)  -  E  a(X . ) }  , 

i=l  1  W  1 


and  let  H  be  as  in  the  proof  of  lemma  3.4.  Minimising  CV  is  equivalent  to 
minimising  CV(*,0),  for  any  0.  Using  the  argument  leading  to  Stone's  [21] 
Lemmas  2,3  and  4,  we  may  show  that  for  any  z>  0  , 

P  (sup  |CV(h,0  )  -  M(h,0  )|/M(h,0  )  >e}  +  0  . 

°1  hrH 

This  formula  serves  as  an  analogue  of  (3.12)  in  the  proof  of  Lemma  3.4.  The 
proof  of  (3.13)  may  now  be  completed  as  was  that  proof.  G 

LEMMA  3.6  For  some  c  >  0  , 

sup  Pe(|hQ-h0|  +  |h  -hQ|>n1/5  Vo. 

0e  0  0  00  a 

PROOF.  Argue  as  in  Lemma  3.3  of  [9],  but  use  Lemmas  3.4  and  3.5  above  to 

A  A 

replace  the  limit  theorems  hQ/h0  £  1  and  hc/hQ  +  1  (in  notation  of  [9]), 

and  use  our  Lemma  3.3  in  place  of  Lemma  3.2  of  [9].  □ 


LEMMA  3.7 


Pc  (  [hfi-hfl|  >  An'3710)  =  0  . 


lim  limsup  sup  PB(|hfl-hfl 

A-*°°  n  ->  «  6*0 


PROOF.  It  suffices  to  show  that  for  any  sequences  0^  =  0^r  0  and  A^  t  00 , 


;-l4)  vvv>x"n  ^o- 


Observe  that 


(3.15)  0  =  ACl)(hfl  ,e  )  =  M(1)(he  ,e  )  +  D(1)(he  ,0  )  =  (hg  -hg  )M(2)(h*.0,  ) 

1  1  1  1  1  11  1 

*  D(1)(he^,81)  , 


v>»:* 


VvV1 


*  •  *  «  **  m-  «*  •  **••••.■ 

%*  %  -.V*  .*  *.* 


•V.V-V V- ^ 


where  h*  lies  inbetween  hn  and  •  Define  c  =c,(n)  and  c~  =  c0(n) 

Wj  IX  L  £ 

by  hy  c^n  and  (h^  ,0^)  '  c^n  2^3.  Then  and  are  bounded 

away  from  zero  and  infinity  as  n  +  »  (note  (3.1)  and  (3.3)  from  Lemma  3.1). 
Given  any  fj  >  0,  there  exists  n(£)>0  such  that  n(C)'*'0  as  £ -*■  0  and  for  large  n 


|h-h  |<£n 


sup  |Mt2)(h,0  )  -  M(2)(h  ,0  )|  <  n(C)n' 

I  1/  b  1  0,1 


(Note  (3.4)  of  Lemma  3.1.)  Let  a  ,  be  fixed  positive  lower,  upper  bounds 
to  c ^ ,  respectively,  and  let  a2  be  a  fixed  positive  lower  bound  to  C2* 

Choose  E,  ,  (0,^)  so  small  that  n(£)  £  |a2. 

By  (3.15), 

l**e1-he1 1  ±  <!a2"'2/5)‘'l  D<1,(*'e1’e1)  I 

i  2a  ‘V/5  sun  |DCl)(n'1/St,e.) | 

la1_<tj<]a1+b1 
~  -1/5 

whenever  the  event  E  =  { |h  -h  I  <  £n  }  holds.  Let  E.  be  the  event 

1  0j  bj  “  2 

{  sup  [D*-1-*  (n"1'/5t,0  )  |  <  n'~/s_£}  where  a  =  ia.,  b  =  £a,  +  b  ,  and  e  is  as 
a<t<b  1  ~  111 

in  (3.9)  of  Lemma  3.3.  Whenever  Ej n  £2  holds,  so  does  the  event 

S  ‘  ^  l^e  -^0  I  1  ^a2  Let  ^4  *3e  t*le  event  that 

lD(1)(^e1’ei)  "  D(1)(h81'e1)I  >n'7/1°- 

Then: 

(3.16)  P0iC|i0i-hei|>Xn  n-3/J0)  <  .  Psi2)  ♦  P0i(E3nE4) 


+  Pfl  {|D(1)(hfl  ,9, )  I  >  A  n'3/10(2a  "1n2/5)_1  -  n'7/10j 
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Chebychev's  inequality  and  (3.5)  of  Lemma  3.2  show  that  the  last-written 
probability  converges  to  zero  as  n  ->  00 .  Lemma  3.4  gives  P0  (3.9) 

of  Lemma  3.3  gives  PQ  (E2)-*-0,  and  (3.10)  of  Lemma  3.3  gives  PQ  (E^E^+O. 


Result  (3.14)  now  follows  from  (3.16).  u 


LEMMA  3.8 


~  -3/10 

lim  limsup  sup  P  ^ ( |hc -h ^  |  >  An  '  )  = 


\-+°°  n  -*■  <®  0£0 


PROOF.  Use  essentially  the  argument  employed  to  prove  Lemma  3.7,  but 
replace  (3.15)  by 

0  =  CV(1)(hc)  =  M(1)(hc,01)  +  DCl)(hc,01)  +  6(1)(hc,01) 

=  (hc-h^)M(2)(h*,01)  +  D(1)(hc,01)  +  6(1)(hc,01), 

/v 

where  h*  lies  inbetween  h  and  h„  .  □ 

C  91 

We  pause  to  introduce  further  notation.  Let  it  be  a  kernel  function,  and 


p(x  |h)  =  (nh)~  l  tt {  (x-X.  )/h} 
i=l 


be  the  corresponding  density  estimator.  Set 


s  (h, 0)  =  /  {p(x |h)  -  0 (x) }y(x)dx, 
V  . 


where  y  is  as  in  Section  2. 


LEMMA  3.9.  Assume  tt  is  Holder  continuous,  vanishes  outside  a  compact 
interval,  and  satisfies  / Tr(x)dx  =  1,  fx  7T(x)dx  =  0.  Then  for  each 
0  <  a  <  b  <  oo  and  each  e  >  0, 


sup  P  {  sup  I  sv(n‘1//5t,0)  I  >  n'1+e)  -*•  0  . 


0  v;a<t<b 


•.* v- ‘v  /  V >  v  v'- 
,  ■  *  ■  •  *  .  •  .*■  .*  .  ,  .*  •  *  «  .  ■  ,  ■  .  *  \  %  ,  •  • 
.*  "  •,>.«.*  .*.*.».*.■  .  •  •  ,  ^  •  _  v  .  .  \  .  •  _  *  ,  •  %  •  *  *  ,  *  ,  ■  .  _»  jk  _»  *  *  _*  ,• 


6- 


PROOF.  Using  Holder  continuity  of  ir  and  the  fact  that  tt  vanishes  outside 
a  compact  interval,  we  may  choose  X > 0  so  large  that 

|sv(n'li/5s,6)  -  sv(n'1/^5t,0)  |  £  Cj  n"1 


uniformly  in  n  >  1,  CeG  ,  v,  a£s£t_<b  with  |s-tj  £  n~\  and  samples 

XL, ,Xn>  Partition  (a,b)  in  the  manner  a  =  t0  <  t^  <  —  <  t^  £  b  <  tv_. 

where  each  t.  -  t.  =  n  ^  .  It  suffices  to  show  that  for  each  e  >  0  , 
l  l+l 


(3.17)  sup  P0{sup|sv(n"1^5ti,0)  |  >n’1  +  e)  0. 

0  v,  i 

Let  l  >  1  be  an  integer,  let  ||Cv||  denote  the  length  of  Cy,  and  notice 
that 

|s  (h,0)|2£  <  ||C  ||  2£_1  L  |  {p  (x  |  h )  -  0(x)  }y(x)  |  2£dx 

V  V  V- 

V 

„  r  -  (6£- l)/5  f  * .  ...  Q,  , 

£  C2  n  Jc  (p(x|h)  -  0(x)}  dx 

v 

uniformly  in  0,  h  and  v.  Therefore  the  left-hand  side  of  (3.17)  is 
dominated  by 


(3.18)  E  ?  sup  P0( |sv(n'1/5t.,0) |  >  n_1+E} 


■1+e-, 


i  l  l  sup  Ee{|nl"esv(n‘1/5t.,0) I**} 

„  „  (4S.+  l)/5-2e«.  y  r  n  r~r  i  *1/5.  ^ 

<  C2  nv  ?  sup  Ee^PCx  |n  t.) 

0 

<  C  n(4il+1V5-2ciL  i  n-4H/5 

—  3  i 


i  21, 


0(x)}2£dx 


The  last  inequality  uses  the  following  facts: 

{p(x  |n'1//5ti)  -  0(x)  }2£  £  C4(p(x  |n‘1'/5ti)  - 

+  C4{E0p(x|n'1/5ti) 

{E8p(x|n"1/5t.)  -  0(x)}2££C5  n~4£/S  ’ 
E0{p(x |n"  1//5ti)  -  EQp(x|n'1//5ti)}2£  £  C&  n 


Eep(x|n*1/5ti)} 
-  0(x)}2£, 

-4£/5 


21 


■  %  •  »  » 
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all  uniformly  in  x,  i  and  0,  the  latter  by  an  inequality  for  centered 
sums  of  independent  random  variables  (formula  (21.4)  of  Burkholder  [3]); 
and  the  integrand  in  (3.18)  vanishes  outside  a  compact  set,  independent  of 
i  and  0.  The  number  of  summands  in  the  sum  over  i  in  (3.18)  is  of  order 
n\  and  so  if  we  choose  £  so  large  that  X  +  (1/5)  -  2e£<0,  the  right- 
hand  side  of  (3.18)  converges  to  zero  as  n  -*■  <®.  This  proves  (3.17).  0 

LEMMA  3.10.  For  each  0  <  a  <  b  <  00  and  all  positive  integers  £, 

sup  EJn1/2D(2)(n"1/5t.,0)|2£  <  C(a,b,£). 
n,0, G,a<t<b  1 

One  consequence  of  this  result  and  (3.1)  of  Lemma  3.1  is  that  for  each 
E  >  0, 

(3.19)  sup  P_{n2/5|D(2}(hA,0)|  >e}  +  0. 

0<Q  0  9 

PROOF  OF  LEMMA  3.10.  Use  the  argument  employed  to  derive  (3.5)  of 
Lemma  3.2.  Li 

4.  Main  proofs. 

Theorem  2.2  is  immediate  from  Lemmas  3.7  and  3.8.  (Remember  that  we 
are  taking  t=2  throughout,  to  simplify  notation.)  The  remainder  of  this 
paper  is  devoted  to  proving  Theorem  2.1.  The  classification  argument  used 
by  Stone  [20]  and  Marron  [11]  is  an  important  element  of  our  proof. 

/\  /v 

Given  a  data-driven  bandwidth  h,  define  ©  to  be  any  element  of  0 

A  A  A  «  A 

such  that  IhC-hl  =  inf  |hfl-h|.  Then  ht  is  also  a  data-driven  bandwidth  - 

9  0,0  0  8 

that  is,  it  is  a  function  of  n  and  X^,...,X  alone;  it  does  not  employ  any 
additional  knowledge  about  the  unknown  density.  For  each  0^e0  , 


Therefore  result  (2.1)  will  follow  if  we  prove  it  for  h0  insead  of  h: 

~  ~  -3/10 

(4. 1)  lim  lim  inf  sup  Pfl( |h^-h  |  >  en  )  =  1  . 
e+0  n  ■*  *  e€0  H  00 

Choose  0<a.  <  b,  <  °°  such  that  2a,  <  n*^hQ  <  jb,  for  all  n  and  all 
11  1  —  y—1 

0.  0.  (Note  (3.1)  of  Lemma  3.1.)  We  keep  a^,  fixed  throughout  this 

a  a  <  /  r  a  •*  /  r  A  -I  /  r 

section.  Define  h’g  =  h~  if  a^  <  hg  <  b^n"  ,  and  h'g  =  |(aj+bj)  n 

otherwise.  Set  L(z)  =  -zK'(z),  and  observe  that  /L(z)dz  =  1  and  / zL(z)dz  =  0 

(In  the  case  of  general  t,  if  K  satisfies  (1.1)  then  so  does  L,  although 

with  d  replaced  by  (t+l)d„.)  Let 
K  l\ 

g(x  |h)  =  (nh)"1  l  L{ (x-X  )/h} 
i=l 

be  the  density  estimate  constructed  using  kernel  L  instead  of  K.  Define 
C(0)  =  /0?(x  |h'g  )  -  g (x  |h* g  )}{0(x)  -  0(x))dx, 


for  6(0.  The  first  step  in  establishing  Theorem  2.1  is  to  prove: 

PROPOSITION  4.1.  Given  rij  > 0,  we  may  choose  >  ®  and  a  sequence  0^  =  0jn  e 
such  that,  for  all  large  n, 

P0  { ISCBj) I  >  n2  n"9^10}  >  1-Hj  • 

The  proof  is  via  a  sequence  of  three  lemmas.  Let  Pq  be  the  probability 
measure  defined  by 

P„(E)  =  2-m  Z  P  (E)  , 

0  0€  0  9 

and  let  EQ  denote  expectation  with  respect  to  Pq.  Under  Pq,0  should  be 
regarded  as  a  random  variable.  There  are  precisely  2m  elements  in  0. 
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Writing  0  =  (1+  Z  t  y  )f„  and  0  =  (1  +  £  i  v  )fn  for  sequences  {t  } 
v  v  v  0  v  v  v  0  v 


and  {t  }  of  0's  and  l's,  we  see  that 
v 


U0)  =  -  c0  s  , 


where  c_  is  the  constant  value  taken  by  f .  on  u  C  , 
0  0  v  v 


S  =  £(t  -t  ) w  , 

V  V  V  V 


and 


A  A 


A  A 


wv  =  Jc  {f(x|h'-J  -  g(x|h'gj}y(x)dx 
v 


(The  function  y  was  defined  in  Section  2.)  Notice  that  S  depends  on  0 
only  through  the  indicators  this  observation  is  crucial  to  our  argument, 
Let  X  denote  the  sample  X^,...,X  .  Under  the  probability  measure  P^, 
and  conditional  on  X,  the  Tv's  are  independent  Bernoulli  random  variables 
with 

(4.2)  qv  =  pqCtv  =  i.fx)  =  [n(v){i  +  Y(xi)})[i  +  n^inUj)}]'1, 

(vl 

where  H  denotes  the  product  over  indices  i  with  X^eC^.  Thus, 


AAA 


U  2  Eq(S|X)  =  I(qy-Tv)wv  , 

=  var0(S|X)  =  Z  %Cl-qv)wy2  , 


A  A 


H  H  Z  En{ i (t  -q  )w  |3|X)  <  Zjw  I3  . 

V  0  V  V  V  —  v  V 


The  next  two  lemmas  describe  asymptotic  properties  of  o  and  0. 


LEMMA  4.1.  There  exist  fixed  constants  0  <  d^  <  d2  <  00  such  that 
PjjCdj  n~9/5  <  a2  <  d2  n‘9/S)  -  1  . 


-20- 


PROOF.  Let  N  denote  the  number  of  elements  of  X  within  C  ,  and  notice 
v  v 


that  the  Pn-distribution  of  the  sequence  {N  V  does  not  depend  on  0. 

O  V 


4/5 


Observe  that  for  a  constant  c  >  0,  En(N  )  =  E0(N,)~cn  .  Therefore  for 

D  V  0  1 


large  n. 


Pa(N  >  3  c  4/^  for  some  v)  <  C  n^"*  P„(N,  >3  cn^^) 
o  v  n  —  t)  l 


<  C  n1/5  PQ{ |N1-E0(N1) |  >  c  n4/5J 


<  C  n1/5(c  n4/5)'2  EtjN^EQCNj)!2} 


0(n-3/5)  . 


Thus,  if  E  is  the  event  that  no  interval  Cy  contains  more  than  3c  n 


4/5 


elements  of  X,  then  inf  Pfi(E)-*-l. 

0^0 


Let  2fv)  denote  summation  over  indices  i  with  X.e  C  ,  and  observe  that 


1  v 


00 

nU){l  ♦  y(X  )}  =  exp[  l  (-l)j+1j_1  ECv:) {yCX  )}J] 

j  =  l 


=  exp(ljv)  +  T^v))  , 


where 


t{v)  =  E(v)y(X.),  |T*v)|  <  l  j'1  E(v)ly(X.)|j  etJv) 

j=2 


.-2/5 


Bearing  in  mind  that  sup|y]  £  n  ,  we  may  easily  prove  that  on  the 

r  v) 

set  E  and  for  all  large  n,  T^  <  C2  uniformly  in  v. 

Thus,  for  each  z>0  there  exist  numbers  0<a2(z)  £b2(z)<°°  such  that, 
on  the  set  {|t|v^|  <_  z }  n  E, 


r(v). 


a2(z)  <  IT  '{1  ♦  y(X.)>  <  b 2 ( z) 


(v), 


for  all  v.  Remembering  the  definition  (4.2)  of  q  in  terms  of  IT v  J{l+y(X.)} 
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we  now  deduce  the  existence  of  a  positive,  decreasing  function  a(z)  <  l, 
such  that  on  E,  |T  |  <  z  implies  |qv-  j|  <  \  -a(z).  Therefore  on  E, 

(4.3)  !a(z)  7.  w  2  I{|Z(V;)  Y(X.)|  <  z}  <  a  2  <  \  Z  w  2  , 

V  1  ~  —  V  v 

for  all  z  >  0. 

Let 

wv(h,z)  E  I{|I(V)  Y(X .)|  <  z}  /  {f (x|h)  -  g(x|h)}  y(x)dx  , 

^  2 

Uv(h,z,0)  =  Eq{wv  (h, z) }  and  y(h,z,0)  =  Zv  uv(h,z,0).  We  claim  that  the 
function  c(n,h,z,0)  defined  by  y(h,z,0)  =  c(n,h, z,0)n~9^5,  is  bounded 
away  from  zero  and  infinity  uniformly  in  n  >  1,  he  n_1/5(a1,b1),  z  >  zQ 
and  0  f  0,  for  some  z0>O.  This  is  relatively  easy  to  verify  if  we  take 
Zq  =  00 •  To  see  that  z0  <  °°  is  permissible,  notice  that 

Uu(h,z,0)  > 

>Uu(h,%0)  -  [P0{|Z(U)  Y(X±)  |  >z}]i[E0{wv4(h,-)}]1  ; 

E0^wv4  (h>°°)  }  1  Cj  n  4  uniformly  in  he  n'^Uj.b  )  and  0e  0;  and 
P0{|Z(v)  y(X.)  |  >  z} 

<  z-2  Eq{|Z(v)  y(X.)  I2} 

=  z*2  Eq[Nv  E{Y2(X1)[Xi(  Cy]  ♦  (Nv2-Ny)E{Y(X1)Y(X2)|X1,X2e  Cy}] 

<  z-2  C2[E0(Nv)  ||CvI|  -1  /c  Y2(x)dx  ♦  E0(NV2)  {\\Cj  /  Ty  Y2(x)dx}2] 

v  V 


uniformly  in  v  and  0.  (Remember  ||C v ||  denotes  the  length  of  Cy.) 
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Consequently, 


i  -2  -1 


Uv(h,z,0)  >  pv(h,oo,0)  -  (C1C3)2  n  z 

uniformly  in  v,h,z  and  0.  Adding  this  inequality  over  v,  we  see  that  tne 
stated  properties  of  the  function  c  are  available  for  some  finite  Zq>0. 

Take  z =  z^  in  (4.3).  In  view  of  the  properties  of  c  established  in 
the  previous  paragraph,  Lemma  4.1  will  follow  via  (4.3),  if  we  prove  that 
for  each  e  >  0,  and  for  z  =  z^  and  z  =  “  , 

(4.4)  sup  Pfl[  sup  |I{wv2(n  1//5t,z)  -  u  (n  1//5t ,  z ,  6) }  |  >  en  9/^5]^0. 

0eQ  V 

Using  Holder  continuity  of  K  and  L,  and  the  fact  that  these  functions 
have  compact  support,  we  may  choose  A  > 0  so  large  that 

(4.5)  I{  |wv2(n"1/5s,z)  -  wv2(n‘1//5t,z)  |  +  |yv(n‘1/5s,  z,  0)  -  uv(n'1/5t, z, 6) | } 

<  C  n'2 

uniformly  in  n,  z  =  and  °°,  0,  v  ,  <  s  <  t  <  bj  with  | s-t  |  £  n”\  and 

samples  X, , . . . ,X  .  Let  a  =  tA  <  t,  <  . . .  <  t  ,  <  b,  <  t  be  a  partition  of 
(aj,bj)  with  t^  -  t^  j  =  n  ^  for  each  i.  In  view  of  (4.5),  result  (4.4)  will 
follow  if  we  show  that  for  each  e>0, 

p0  "  i  V  iFv{wv2(n'1/5ti,z)  "  lJv(n'1/5ti'z‘0)]l  >cn_9/S] 

converges  to  zero  uniformly  in  0c  0  and  z  =  z^  and  °°. 

Since  K  and!  L  have  compact  support,  and  each  t^  e  (a^.b^),  then  for  each 

i  we  may  divide  the  subscripts  v  among  a  fixed  finite  number  k  not  depending 

on  i  or  n)  of  sets  suc^  that  f°r  each  i  and  j»  and  for  z  =  zQ  and 

~  2  -1/5 

00 ,  the  variables  wv  (n  t^z),  v«  V..,  are  stochastically  independent,  and 


-V  *. 
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for  each  i,  each  subscript  v  is  contained  in  just  one  set  .  Consequently, 

for  all  integers  £  _>  1 , 
k 

Pe  i  Z  l  PQll  l  {wv2(n"l/5ti,z)  -  uv(n'1/5ti,z,e)}|  >  e  k_1n'9/5] 

1  j=l  vsV. . 

ij 

-2  l  E  [|Cek_1n"9/5)_1  £  {wv2(n'1/5t.,z)  -  ^ (n'1/5t., z, 0) } \21} 

i  j=l  veV. . 

J  iJ 

An  inequality  for  moments  of  sums  of  independent  random  variables  [3, 
formula  (21.4)]  now  gives 

P9  1  CiU)  (e_1k)2*  nm/S  Z  j  [(  I  E0(Y.2)}£+  l  E0(|Y.v|2£)], 


i  j=l  veV. . 

iJ 


veV.  . 
iJ 


~  2  -1/5  -1/5 

where  Y^  =  w  (n  t^,z)  -  y  (n  t^,z,8).  The  same  moment  inequality  gives 

2£  -4£ 

E ( I Y i v  I  1^2"  uniformly  in  i,  v  and  0.  Since  the  number  of  partition 
points  t^  is  of  order  n\  then 

,  r  18£/5  1/5  -4.  £  1/5  -4 A,  X-SL/5.  n 

SUP  Pq  <  C,  n  Z{(n  n  )  +  n  n  }  =  0(n  )  +  0  , 

0  °  i 

provided  only  that  £  >  5  X  .  □ 

LEMMA  4.2.  For  each  e  >  0 , 

P0(B  >  n‘l4/5+£)  -  0  . 

PROOF.  The  argument  used  to  prove  Lemma  4.1  shows  that  for  some  c^ >  0, 

P  (Z  w  2  >  c  n‘9/5)  -  0. 

0  v  V  3 


Applying  Lemma  3.9  twice,  once  with  p  =  f  and  once  with  p  =  g,  we  obtain: 

P„(sup  j  w  I  >  n  c)  -+  0  . 

0  1  v 1 


Lemma  4.2  follows  on  combining  these  results.  0 


Let  $  be  the  standard  normal  distribution  function. 


LKMMA  4.3.  For  some  fixed  c  >  0, 


liminf  inf  [PQ C | S |  >  n”9^10x)  -  2{1  -  4>(cx)}]  >  0. 

n  ->  oo  x>0 


~  a_3  -1/20 

PROOF.  Let  E  denote  the  event  that  g  o  <  n  .  According  to  Lemmas 
4. 1  and  4.2, 

P0ci>  <  P0to2  <  dj  n-9/-)  *  >  n1'20^  „-9'5)3'2}  -  0  . 


On  the  set  E  ,  the  Berry-Esseen  bound  [IS,  page  111]  gives 


sup  |P  (S  <  x  |X)  -  *{(x-H)/a}|  <  A  n"1/2°  , 


-°°<X<“ 

where  A  is  an  absolute  constant.  Therefore  on  E,  and  for  x  >  0, 

P  (|S|  >  x|  X)  > 1  -  ${(x-i)/a}  +  4>{  C-x-pD/a}  -  2A  n'1/2° 


>  2{1  -  <Hx/o)}  -  2A  n 


1/20 


Taking  expectations,  and  using  Lemma  4.1  again,  we  obtain  Lemma  4.3.0 


To  obtain  Proposition  4.1  from  Lemma  4.3,  choose  x  >  0  so  small  that 

.-1 


2 { 1  -  4>(cx)}  >l-j  rij»  and  let  n2  =  Cq  x.  Then  for  large  n. 


1  -  lP0C|c0  s|  >  n2  n"9/1°) 


-  2'm  l  Pfl{|€(0)|  >  n‘y/lu}  . 


-9/10, 


6e0 


Therefore  there  must  exist  some  9^0  such  that 


1  -  0j  £  Pe  CUCOj)!  >  n2  n”9/10}  . 


Throughout  the  remainder  of  our  proof  of  (4.1),  we  work  with  the  "worst 
case"  density  0j  =  9ln  specified  by  Proposition  4.1,  for  some  fixed  n1»n2 >0 


>\V 
--  ^  ■ 


Notice  that 


(3/3h)  f(x|h)  =  h~1{g(x|h)  -  f(xjh)} 

and 

A(h,0)  =  ACh.Sj)  -  2  /  {f (x |h)  -  GjWHeU)  -  G^xJJdx 
+  /  { 0 (x)  -  01(x)}2dx  . 

/V  A 

Differentiate  the  latter  formula  with  respect  to  h,  and  take  h  =  hA, 
obtaining : 

0  =  A(1)(hg,01)  +  2  hA_1  /  {f(x|hA)  -  g(x|hA)}  { 0 (x)  -  01(x}}dx. 


That  is. 


(1) 


(4.6)  AV4-'(hj,e1)hj  =  -2  cce1) 


~  -1/5  (1)  A  A 

provided  h"c  n  (a^bj).  Expand  Av  •'(hg,61)  in  a  Taylor  series  about  hQ 


(4.7)  A(1)(hA,Q1)  =  (hg-h^)  A(2)(h*,61), 


where  h*  lies  inbetween  hC  and  hQ  .  This  definition  of  h*  is  used  in  all 

U  U - 

1  ^  /s  -1/4 

that  follows.  Define  h  =  h*  if  |hA-hg  |  <_  n  ,  and  h  =  hfl  otherwise. 


1 


0 


1 


LEMMA  4.4.  For  each  e  >  0, 


pQ  (n2/5|A(2)(h+,01)  -  A^OiQ^ej)!  >  c)->o. 


PROOF.  From  our  definition  of  h  , 


|h^-h„  |  <  n-1^4  +  |h_  -h.  |  . 


V  ~ 


■01-01 


Therefore  by  Lemma  3.7, 
> 

0 


(4.8)  Pfl  (|h+-h  |  >  2  n_1/4)  -  0. 

1  °1 


It  now  follows  from  (3.4)  of  Lemma  3.1  that 


.  *  _  T»  V  ..  W  W. 


P0  {n2/5jM(2)(h+,ei)  -  M^2^  (h0^»0j)  i  >  e}-0, 
and  since  (by  (4.8)) 

(4.9)  PQ  {h+,  hQ  en'1/5(a1,b1)}^l  , 

the  proof  of  Lemma  4.4  will  be  completed  if  we  show  that 

(4.10)  Pfi  {n2/5|D(2)(h+,01)j  >  e}  +  0  and  Pft  {n2/5  |Dt2)  (h  ,0  )|>e}-*0. 

°1  1  1 

Using  Holder  continuity  of  K,  K'  and  K’’,  and  the  fact  that  each  of 
these  functions  has  compact  support,  we  may  produce  X  >  0  such  that 

Id*-2-*  (n'1/5s,01)  -  D(2)(n_1/5t,01)|  <  C  n"1 

uniformly  in  n  >  1,  s  and  te  (a^.bj)  with  |s-t|  £  n  \  and  samples  X^,...,Xn 

Let  a  =  t„ <  t  <  . . .  <  t  ,  <  b,  <  t  be  a  partition  of  (a,,b,)  such  that 
10  1  v-1  —  1  v  r  1  1 

ti  -  ti  j  =  n  ^  for  each  i.  In  view  of  (4.9),  to  prove  (4.10)  it  suffices  to 
prove  that 

pHPe  {n2/5jD(2)(n'1/5ti,61)  |  >  t]  -  0  . 
i  1 

But  this  is  immediate  from  Lemma  3.10  and  Markov’s  inequality: 

„  r  -1  -1/10, 21 
p<CI(e  n  )  -*-0, 

i 

provided  i  is  sufficiently  large.  □ 

In  view  of  (3.19) , 

PQ  {n2/5|A(2)(h0  ,0^  -  M(2)(h0  ,6^1  >  e)  -►  0 
for  each  e>  0,  and  so  by  Lemma  4.4, 

(4.11)  PQ  (n2/^  |A^2)  (h+ ,  0  j )  -  M(2)(h0  ,6^1  >  e)  -  0  . 


.•.*  A  *  .  •  .** .  •  .*• 

"  i.*  *'  \  *  *  ”  ■ '  *  ■ » *  .  ■  »*-."• 


\v. 


<  JZ 


Noting  (3.3)  of  Lemma  3.1,  let  0  <  a2  <  l  b2  <  °°  be  constants  such  that 
n»/5M(2)(h^  e  (a2,  1  b2)  for  all  n,  and  take  e  =  a2  in  (4.11).  Then 

(4.12)  PQ  {0<  AC2)(h+,61)  <b2  n'2/5}^l  . 

Let  rij,  n 2  be  as  in  Proposition  4.1,  and  set  =  (b^b2)  Let  E^ 

^  *  -1/4  f21 

be  the  event  that  |h0-h0  |  <_  n  ,  E2  the  event  that  |A^  J  (h  , 0 ^ )  |  _< 

-2/5  ~  -1/5 

b~  n  ,  and  E,  the  event  that  h2  en  (a,,b,).  Remember  that 

=  h*  on  Ej,  and  that  (4.6)  holds  on  E^.  By  (4.6)  and  (4.7), 

(4.13)  P0  (|h;-h0  |  >  n3  n'3/10.,  Ex) 

>  P0^{  |A(1)  (h-,01)  |  >  n3  n‘3/10.b2  n_2/5;  E^  -  P  (E2) 

>  P0^(  1 2  COpI  >  n3  b2  n_7/1°  •b1  n'1/5;  -  P^E.,)  -  P0  (ij) 

iylwpl  >  n2  «-9/1°:  E,)  -Ee/Ej) 

-  pe1(Ei-)  '  pe1(E2-)  *  pe1^E3^  "  ni  * 

the  last  line  following  from  Proposition  4.1.  Result  (4.12)  and  Lemma  3.4 

imply  that  PA  (E?)  ■*  0  and  P  (E,)  0,  respectively.  If  n  is  so  large 

W1  W1  J 

that  n3  n '371®  <  n-1^4,  then  by  (4.13), 

pei'lh>\l  >n, 

-  >r>3  "'3/1°-  Ei)  *  pe/E,) 

>  {Pq^Ej)  -  Pe^E2^  '  P01(E3^  ~  +  P01(-El^ 

1  **0^2^  ‘  P01^E3-1  Ij  1  -  Hj 

as  n  +  °°.  This  proves  (4.1),  and  completes  the  proof  of  Theorem  2.1. 
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